Earthquake Analysis For Magnitude 7 And Higher Since 1900

by Wanda Chen

=====================================================================

Basic Information about the Data

##  [1] "time"      "latitude"  "longitude" "depth"     "mag"      
##  [6] "magType"   "nst"       "gap"       "dmin"      "rms"      
## [11] "net"       "id"        "updated"   "place"     "type"
## [1] "1960-05-22 19:11:20.000" "1964-03-28 03:36:16.000"
## [3] "2011-03-11 05:46:24.120" "2004-12-26 00:58:53.450"
## [5] "1952-11-04 16:58:30.000" "2010-02-27 06:34:11.530"
## [1] "19:11:20" "03:36:16" "05:46:24" "00:58:53" "16:58:30" "06:34:11"
## [1] 19  3  5  0 16  6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   15.00   28.75   71.98   40.00  675.40
## 
##         deep intermediate      shallow      surface 
##           91          126         1090           11
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.358   7.600   9.600
## 
## class1 class2 class3 class4 class5 class6 
##    885    341     75     13      3      1
## 
## 1900s 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s  1990 2000s 2010s 
##    30    60   112   126   110   101   139   131   110   155   148    96
##  [1] "time"        "latitude"    "longitude"   "depth"       "mag"        
##  [6] "magType"     "nst"         "gap"         "dmin"        "rms"        
## [11] "net"         "id"          "updated"     "place"       "type"       
## [16] "New_Time"    "Date"        "Time"        "Year"        "Month"      
## [21] "Day"         "Hour"        "depth_class" "mag_class"   "long"       
## [26] "long_class"  "lat_class"   "decade"
## 'data.frame':    1318 obs. of  28 variables:
##  $ time       : Factor w/ 1318 levels "1900-07-29T06:59:00.000Z",..: 546 592 1252 1150 468 1226 605 1272 396 442 ...
##  $ latitude   : num  -38.14 60.91 38.3 3.29 52.62 ...
##  $ longitude  : num  -73.4 -147.3 142.4 96 159.8 ...
##  $ depth      : num  25 25 29 30 21.6 22.9 30.3 20 15 15 ...
##  $ mag        : num  9.6 9.3 9 9 8.9 8.8 8.7 8.6 8.6 8.6 ...
##  $ magType    : Factor w/ 6 levels "","ms","mw","mwb",..: 3 3 6 5 3 5 3 6 3 3 ...
##  $ nst        : int  NA NA 541 601 NA 454 NA 499 NA NA ...
##  $ gap        : num  NA NA 9.5 22 NA 17.8 NA 16.6 NA NA ...
##  $ dmin       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ rms        : num  NA NA 1.16 1.17 NA 1.09 NA 1.33 NA NA ...
##  $ net        : Factor w/ 4 levels "atlas","gcmt",..: 3 3 4 4 3 4 3 4 3 3 ...
##  $ id         : Factor w/ 1318 levels "atlas19230901025800",..: 240 207 1287 1188 322 1261 193 1306 381 338 ...
##  $ updated    : Factor w/ 1296 levels "2014-02-11T02:25:27.101Z",..: 160 631 1217 1168 40 1218 219 1222 572 549 ...
##  $ place      : Factor w/ 364 levels "101km SW of Atka, Alaska",..: 52 315 181 236 234 239 270 236 297 89 ...
##  $ type       : Factor w/ 1 level "earthquake": 1 1 1 1 1 1 1 1 1 1 ...
##  $ New_Time   : chr  "1960-05-22 19:11:20.000" "1964-03-28 03:36:16.000" "2011-03-11 05:46:24.120" "2004-12-26 00:58:53.450" ...
##  $ Date       : Date, format: "1960-05-22" "1964-03-28" ...
##  $ Time       : chr  "19:11:20" "03:36:16" "05:46:24" "00:58:53" ...
##  $ Year       : num  1960 1964 2011 2004 1952 ...
##  $ Month      : num  5 3 3 12 11 2 2 4 4 8 ...
##  $ Day        : num  22 28 11 26 4 27 4 11 1 15 ...
##  $ Hour       : num  19 3 5 0 16 6 5 8 12 14 ...
##  $ depth_class: chr  "shallow" "shallow" "shallow" "shallow" ...
##  $ mag_class  : Factor w/ 6 levels "class1","class2",..: 6 5 5 5 4 4 4 4 4 4 ...
##  $ long       : num  287 213 142 96 160 ...
##  $ long_class : chr  "WestH" "WestH" "EastH" "EastH" ...
##  $ lat_class  : chr  "SouthH" "NorthH" "NorthH" "NorthH" ...
##  $ decade     : Factor w/ 12 levels "1900s","1910s",..: 7 7 12 11 6 12 7 12 5 6 ...

Univariate Plots Section

magnitude – 7.5 <= mag < 8.0

magnitude – 8.0 <= mag < 8.5

magnitude > 8.5

Some information about the dataset

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.358   7.600   9.600
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   7.000   6.565  10.000  12.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   15.00   28.75   71.98   40.00  675.40
## 
##   7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9   8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 
## 289 205 180 119  92  83  80  77  67  34  21  25  13  13   3   4   6   1 
## 8.8 8.9   9 9.3 9.6 
##   1   1   2   1   1
## 
##   1   2   3   4   5   6   7   8   9  10  11  12 
## 108 106 109 108 119  98 104 124  90 117 129 106
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
## 55 60 66 63 51 45 54 50 53 50 61 50 49 48 68 45 58 53 57 62 51 62 65 42
## 
## EastH WestH 
##   900   418
## 
## NorthH SouthH 
##    720    598

scatterplot to show where earthquakes happened

Other plots that look at relation of magnitude, depth, occurance frequency and variance of earthquakes.

## Warning in loop_apply(n, do.ply): Removed 2 rows containing missing values
## (geom_path).
## Warning in loop_apply(n, do.ply): Removed 2 rows containing missing values
## (geom_path).
## Warning in loop_apply(n, do.ply): Removed 2 rows containing missing values
## (geom_path).
## Warning in loop_apply(n, do.ply): Removed 2 rows containing missing values
## (geom_path).
## Warning in loop_apply(n, do.ply): Removed 2 rows containing missing values
## (geom_path).
## Warning in loop_apply(n, do.ply): Removed 2 rows containing missing values
## (geom_path).

## 
## 1900s 1910s 1920s 1930s 1940s 1950s 1960s 1970s 1980s  1990 2000s 2010s 
##    30    60   112   126   110   101   139   131   110   155   148    96

Univariate Analysis

What is the structure of your dataset?

This is the basic structure of the .csv file that I downloaded:

time - Time when the event occurred. Times are reported in milliseconds since the epoch ( 1970-01-01T00:00:00.000Z), and do not include leap seconds. In certain output formats, the date is formatted for readability.
latitude - Decimal degrees latitude. Positive values for northern latitudes. Negative values for southern latitudes.
longitude - Decimal degrees longitude. Positive values for eastern longitudes. Negative values for western longitudes.
depth - depth of the event in the kilometers
mag - The magnitude for the event
magType - The method or algorithm used to calculate the preferred magnitude for the event. Includes “Md”, “Ml”, “Ms”, “Mw”, “Me”, “Mi”, “Mb”, “MLg”.
nst - The total number of Number of seismic stations which reported P- and S-arrival times for this earthquake.
gap - The largest azimuthal gap between azimuthally adjacent stations (in degrees). In general, the smaller this number, the more reliable is the calculated horizontal position of the earthquake.
dmin - Horizontal distance from the epicenter to the nearest station (in degrees). 1 degree is approximately 111.2 kilometers. In general, the smaller this number, the more reliable is the calculated depth of the earthquake.
rms - The root-mean-square (RMS) travel time residual, in sec, using all weights. This parameter provides a measure of the fit of the observed arrival times to the predicted arrival times for this location. Smaller numbers reflect a better fit of the data. The value is dependent on the accuracy of the velocity model used to compute the earthquake location, the quality weights assigned to the arrival time data, and the procedure used to locate the earthquake.
net - The ID of a data contributor. Identifies the network considered to be the preferred source of information for this event. The value includes: ak, at, ci, hv, ld, mb, nc, nm, nn, pr, pt, se, us, uu, uw.
id - A (generally) two-character network identifier with a (generally) eight-character network-assigned code.
updated - Time when the event was most recently updated.
place - Textual description of named geographic region near to the event. This may be a city name, or a Flinn-Engdahl Region name.

The variables that were created from the dataset -

Date - the date when the event happened
Year - the year when the event happened (4 digits value; 1900 - 2015)
Month - the month when the event happened (1-2 digits value; 1-12)
Day - the day when the event happened (1-2 digits value; 1-31)
Hour - the hour when the event happened (1-2 digits value; 0-23)
depth_class - different zones for the depth of the epicenter:
depth = 0 – character string - “surface”
0 < depth < 70 – character string - “shallow”
70 <= depth <= 300 – character string - “intermediate”
depth > 300 – character string - “deep”
The zones (except depth=0) is according to USGS defined
long - it is the another way to express of longitude that shows center of graph is 180 instead of 0
mag_class - the class that shows the group for the magnitude:
class 1 - magnitude < 7.5
class 2 - 7.5 <= magnitude < 8.0
class 3 - 8.0 <= magnitude < 8.5
class 4 - 8.5 <= magnitude < 9.0
class 5 - magnitude >= 9.0
long_class - the class that shows the group for the longitude
WestH - longitude > 180.0
PrimeMeridian - longitude = 180.0
EastH - longitude < 180.0
lat_class - the class that shows the group for the latitude
NorthH - latitude > 90.0
Equator - latitude = 90.0
SouthH - latitude < 90.0
Decade - for each 10 years in the Year field, it will determine which decade the time is in. It ranges from the 1900s all the way to the 2010s.

What is/are the main feature(s) of interest in your dataset?

The recent Napel earthquake makes me think about the relationship between the magnitude and depth of the epicenter, and the location. I want to know more about how location, depth, and magnitude are correlated since there are quite a number of earthquakes that have happened recently that are shallow earthquakes, with a high magnitude.
I am also interested in the month of the earthquake, and I want to know if the Sun, Moon, and the Earth, may have a relationship to earthquakes.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

magType field may help in my investigation. Unfortunately, there is no fault line information; I think that fault line information will help even more than just the magType.

Did you create any new variables from existing variables in the dataset?

Yes.
The time field in the dataset includes date and time. I am interested in the Year and Month fields, so I split the time field into Date, and Time. The Date field is split into 3 fields - Year, Month, Day. I only extracted the Hour from the Time field.
depth_class - a field that shows different depth class (surface, shallow intermediate, deep) according to its depth.
mag_class - a field that shows different magnitude class (separated by range of 0.5 magnitude)
long_class - a field that describes the longitude on the earth (180 degree as divider). West Hemisphere, East Hemisphere, Prime Meridan.
lat_class - similar to long_class, but it describes latitude (0 degree is the divider). North Hemisphere, South Hemisphere, Equator.
decade - it shows the number of the decade for the year.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I adjusted the dateset to make sure the date is split off from the time field. The Date field splits to 3 fields - Year, Month, Day, so it can help me to analyze the data set.
From the data it is rare to have magnitude 9 or higher earthquakes.
Also it is less likely to have large earthquakes that are magnitude 8.5 or higher and depth is more than 100 km.

Bivariate Plots Section

## 
##  Pearson's product-moment correlation
## 
## data:  mag and depth
## t = -1.1972, df = 1316, p-value = 0.2315
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08682423  0.02105088
## sample estimates:
##         cor 
## -0.03298273

## 
##  Pearson's product-moment correlation
## 
## data:  mag and depth
## t = -0.9579, df = 1099, p-value = 0.3383
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08781253  0.03024934
## sample estimates:
##         cor 
## -0.02888232

## 
##  Pearson's product-moment correlation
## 
## data:  mag and depth
## t = -1.1972, df = 1316, p-value = 0.2315
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08682423  0.02105088
## sample estimates:
##         cor 
## -0.03298273

## 
##  Pearson's product-moment correlation
## 
## data:  mag and depth
## t = 0.6564, df = 89, p-value = 0.5133
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1385166  0.2714726
## sample estimates:
##        cor 
## 0.06940823

Other possible variables summary

##      ms  mw mwb mwc mww 
##   7  94 907  46 193  71

##  atlas   gcmt iscgem     us 
##      7      1    731    579
## ef$decade: 1900s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.425   7.700   7.640   7.800   8.300 
## -------------------------------------------------------- 
## ef$decade: 1910s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.200   7.400   7.498   7.800   8.300 
## -------------------------------------------------------- 
## ef$decade: 1920s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.200   7.296   7.400   8.400 
## -------------------------------------------------------- 
## ef$decade: 1930s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.374   7.600   8.500 
## -------------------------------------------------------- 
## ef$decade: 1940s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.378   7.600   8.600 
## -------------------------------------------------------- 
## ef$decade: 1950s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.362   7.500   8.900 
## -------------------------------------------------------- 
## ef$decade: 1960s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.371   7.600   9.600 
## -------------------------------------------------------- 
## ef$decade: 1970s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00    7.00    7.20    7.31    7.50    8.10 
## -------------------------------------------------------- 
## ef$decade: 1980s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.025   7.200   7.255   7.400   8.200 
## -------------------------------------------------------- 
## ef$decade: 1990
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.299   7.500   8.300 
## -------------------------------------------------------- 
## ef$decade: 2000s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.350   7.422   7.600   9.000 
## -------------------------------------------------------- 
## ef$decade: 2010s
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.367   7.525   9.000
## ef$Hour: 0
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.200   7.355   7.600   9.000 
## -------------------------------------------------------- 
## ef$Hour: 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.303   7.500   8.100 
## -------------------------------------------------------- 
## ef$Hour: 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.250   7.356   7.600   8.300 
## -------------------------------------------------------- 
## ef$Hour: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.397   7.700   9.300 
## -------------------------------------------------------- 
## ef$Hour: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.392   7.600   8.300 
## -------------------------------------------------------- 
## ef$Hour: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.467   7.800   9.000 
## -------------------------------------------------------- 
## ef$Hour: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.372   7.675   8.800 
## -------------------------------------------------------- 
## ef$Hour: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.200   7.272   7.400   8.100 
## -------------------------------------------------------- 
## ef$Hour: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.200   7.300   7.381   7.600   8.600 
## -------------------------------------------------------- 
## ef$Hour: 9
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.200   7.296   7.575   8.300 
## -------------------------------------------------------- 
## ef$Hour: 10
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.336   7.500   8.200 
## -------------------------------------------------------- 
## ef$Hour: 11
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00    7.10    7.20    7.37    7.60    8.50 
## -------------------------------------------------------- 
## ef$Hour: 12
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.316   7.400   8.600 
## -------------------------------------------------------- 
## ef$Hour: 13
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.371   7.550   8.300 
## -------------------------------------------------------- 
## ef$Hour: 14
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00    7.10    7.20    7.34    7.60    8.60 
## -------------------------------------------------------- 
## ef$Hour: 15
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.324   7.500   8.300 
## -------------------------------------------------------- 
## ef$Hour: 16
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.025   7.250   7.393   7.600   8.900 
## -------------------------------------------------------- 
## ef$Hour: 17
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.353   7.500   8.500 
## -------------------------------------------------------- 
## ef$Hour: 18
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.353   7.500   8.600 
## -------------------------------------------------------- 
## ef$Hour: 19
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00    7.10    7.30    7.41    7.60    9.60 
## -------------------------------------------------------- 
## ef$Hour: 20
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.351   7.600   8.400 
## -------------------------------------------------------- 
## ef$Hour: 21
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.300   7.356   7.600   8.200 
## -------------------------------------------------------- 
## ef$Hour: 22
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.357   7.600   8.400 
## -------------------------------------------------------- 
## ef$Hour: 23
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.400   7.371   7.500   8.200
## ef$Month: 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.200   7.322   7.600   8.300 
## -------------------------------------------------------- 
## ef$Month: 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.281   7.400   8.800 
## -------------------------------------------------------- 
## ef$Month: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.367   7.600   9.300 
## -------------------------------------------------------- 
## ef$Month: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.344   7.525   8.600 
## -------------------------------------------------------- 
## ef$Month: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.376   7.600   9.600 
## -------------------------------------------------------- 
## ef$Month: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.383   7.700   8.400 
## -------------------------------------------------------- 
## ef$Month: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.369   7.600   8.300 
## -------------------------------------------------------- 
## ef$Month: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.333   7.500   8.600 
## -------------------------------------------------------- 
## ef$Month: 9
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.400   7.404   7.600   8.500 
## -------------------------------------------------------- 
## ef$Month: 10
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.200   7.298   7.400   8.500 
## -------------------------------------------------------- 
## ef$Month: 11
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00    7.10    7.30    7.41    7.70    8.90 
## -------------------------------------------------------- 
## ef$Month: 12
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.100   7.300   7.414   7.700   9.000

Investigate the data that is difference among the Richter magnitude scale

Richter magnitude scale was defined in 1935, and widely used after 1970. The graphs here shows how the earthquake spread out: 1)before 1935; 2) Between 1935 and 1970; 3)After 1970

Epicenter analysis - look through different depth, magnitude, time

## ef$mag_class: class1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1905    1943    1970    1968    1994    2015 
## -------------------------------------------------------- 
## ef$mag_class: class2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1900    1936    1965    1963    1995    2015 
## -------------------------------------------------------- 
## ef$mag_class: class3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1905    1926    1958    1959    1994    2014 
## -------------------------------------------------------- 
## ef$mag_class: class4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1933    1950    1960    1969    2005    2012 
## -------------------------------------------------------- 
## ef$mag_class: class5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1964    1984    2004    1993    2008    2011 
## -------------------------------------------------------- 
## ef$mag_class: class6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1960    1960    1960    1960    1960    1960

## ef$mag_class: class1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   6.000   6.444  10.000  12.000 
## -------------------------------------------------------- 
## ef$mag_class: class2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   7.000   6.839  10.000  12.000 
## -------------------------------------------------------- 
## ef$mag_class: class3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.500   7.000   7.053  10.000  12.000 
## -------------------------------------------------------- 
## ef$mag_class: class4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   5.077   8.000  11.000 
## -------------------------------------------------------- 
## ef$mag_class: class5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0     3.0     3.0     6.0     7.5    12.0 
## -------------------------------------------------------- 
## ef$mag_class: class6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       5       5       5       5       5       5

## ef$mag_class: class1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    5.00   12.00   11.39   17.00   23.00 
## -------------------------------------------------------- 
## ef$mag_class: class2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     5.0    11.0    11.7    19.0    23.0 
## -------------------------------------------------------- 
## ef$mag_class: class3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00   12.00   11.36   17.00   23.00 
## -------------------------------------------------------- 
## ef$mag_class: class4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00    8.00   14.00   12.38   16.00   19.00 
## -------------------------------------------------------- 
## ef$mag_class: class5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.500   3.000   2.667   4.000   5.000 
## -------------------------------------------------------- 
## ef$mag_class: class6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      19      19      19      19      19      19

Magnitude and Depth Relationship

## Warning in loop_apply(n, do.ply): Removed 110 rows containing missing
## values (stat_smooth).
## Warning in loop_apply(n, do.ply): Removed 110 rows containing missing
## values (geom_point).

## Warning in loop_apply(n, do.ply): Removed 27 rows containing missing
## values (stat_smooth).
## Warning in loop_apply(n, do.ply): Removed 27 rows containing missing
## values (geom_point).

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

From the graphs that I picked, it shows a strong relationship among magnitudes and the locations of the epicenter. A lot of major ones are located in the “Ring Of Fire” area, which is on either side of the Pacific Ocean.
It also shows strong earthquakes located in Southeast Asia, and the China/India border. It shows similarly in the graph that USGS created that most of the earthquakes happen in the border of plate boundaries “http://earthquake.usgs.gov/earthquakes/world/seismicity_maps/world.pdf”.
I thought the magnitude and depth of the epicenter has some kind of correlation, but when I ran the test, all the values are really close to 0. So, it shows there is no direct relation between the depth and magnitude. It might have some relation if the data set included casualty and financial lost of the earthquakes. However, with large earthquakes (magnitude >=8.5), the depth does have a small correlation with the magnitude; it usually happens with shallow or surface area compared with intermediate or deep earthquakes.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

One of the things may be just coincidence: the earthquakes that happened in April are all shallow class so far; the earthquakes that happened in October were shallow and deep class; and the earthquakes that happened in December were shallow and intermediate. The rest of the months included all types of earthquakes. Magnitudes that are 9 and above happened in March, May, and December.
One of the methods of algorithm used to calculate the preferred magnitude for the earthquakes is “mw” including a few of the largest earthquakes, which its net (data contributor) is “iscgem”.
One of graphs that I found interesting is the earthquake magnitude class vs. the year it happened. The boxplot shows that the median of each mag_class is all around 1960’s except for magnitude >= 9.0 (class5), and the result for the mean is also similar to the median. The graph for Month vs mag_class also shows similar results. Both mean and median for magnitude < 8.5 (class1, class2, class3) is around June and July; but for magnitude >= 8.5 (class4 and higher) the median is on April, and Mean is somewhere in May. With mag_class vs Hour, for class1 - class4, the mean is around Hour 11 - Hour 12. For median it is also around Hour 11 - Hour 12 for class1 - class3, but the median for class4 is Hour 14, which is different than its mean.

What was the strongest relationship you found?

The strongest relationship I found is the location (latitude and longitude) vs magnitude of the earthquakes. It sits into the “Ring of Fire”, which is from southeast of the Pacific Ocean toward north and all the way to the southwest of Pacific Ocean. Also even though the Richter magnitude scale was defined around 1935, it did not become widely used until after 1970. All the earthquakes through out the 3 different time still show the Ring of Fire has the most earthquakes than other area.

Multivariate Plots Section

More detail analysis on magnitude and depth of epicenter and its occurrance

How earthquakes spread throught out the years and months

Occurance of earthquake through different combination view of mag_class, depth_class, longitude, latitude, magnitude, and depth.

Scatterplot Matrix Analysis for each hemisphere

Mean/median analysis - Hour, Month, and Year/Decade

Investigate Hour with Magnitude Mean/Median

Investigate Month with Magnitude Mean/Median

Investigate decade with Magnitude Mean/Median

## Warning in loop_apply(n, do.ply): Removed 900 rows containing missing
## values (geom_point).
## Warning in loop_apply(n, do.ply): Removed 900 rows containing missing
## values (geom_point).

## Warning in loop_apply(n, do.ply): Removed 418 rows containing missing
## values (geom_point).
## Warning in loop_apply(n, do.ply): Removed 418 rows containing missing
## values (geom_point).

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

From the bivariate analysis, I already know that most of earthquakes happened around the “Ring of Fire”. here, I separated the latitude and longitude, and I wanted to know if certain latitude or longitude happened more than others. From the depth of the epicenter, the longitude between 90 and 190, and 260 and 300, there were lots of occurrences of earthquakes. Among them, longitude between 100 and 180, and 280 and 300 had more deep earthquakes. When I checked the map of the earth, the longitude is the international dateline, and it is located in the middle of Pacific Ocean.

Were there any interesting or surprising interactions between features?

The distribution of the earthquakes for magnitude class and depth class are similar. Although most of the earthquakes are either shallow or intermediate, there were a few exceptions with magnitude around 7, but whose depth belong to “deep” class. It looks like Europe or Africa do not have a lot of large earthquakes (magnitude 7 and higher) compared to Asia and America. The exception that happened a lot and does not belong to the “Ring of Fire” is Afghanistan, and the area that goes through border of Europe. North hemisphere has more large earthquakes than south hemisphere, and east hemisphere has more than twice that of west hemisphere.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.


Final Plots and Summary

Plot One

Description One

It shows most of earquakes around the Ring of Fire. Most class 1 earthquakes (magnitude > 7.5) are shallow earthquakes. Also for magnitude >= 7.0 earthquakes, high magnitude earthquakes are less likely to happen. It is less likely to have magnitude more than 8.0. It shows that the right side of the “Ring of Fire” has fewer deep eathquakes than the left side of the ring. And most deep earthquakes are around the ocean area especially on the left side of the ring.
It is less likely to have large earthquakes in Europe or Africa areas.
From this graph, we can also tell that the area around the Pacific Ocean has more earthquakes than the Indian or Atlantic Ocean.

Plot Two

Description Two

These 3 graphs helped me solve the following questions that I had when I started this project for earthquakes of magnitude 7 or higher since 1900:
  • Which decade has the largest earthquake? - 1960s (9.6)
  • Which decade has the most earthquake? - 1990s (155)
  • Which decade has the highest median earthquake? - 1900 (7.70)
  • Which decade has the highest mean earthquake? - 1900 (7.70)
  • Which month has the highest mean earthquake? - December (7.414)
  • Which month has the highest median earthquake? - September (7.4)
  • Which month has the most amount of earthquake? - November (129)
  • Which month has the least amount of earthquake? - September (90)
  • Which hour has the highest median erthquake? - Hour 23 (7.4)
  • Which hour has the highest mean erthquake? - Hour 5 (7.46)
  • Which hour has the most amount of earthquake? - Hour 14 (68)
  • Which hour has the least amount of earthquake? - Hour 23 (42)

Plot Three

Description Three

  1. When looking at the correlation values, it seems none of the fields are related to each other since the correlation values are almost 0.
  2. When I look at the (latitude, long) graph, it looks like the ring of fire in its reverse side way, which is similar to plot 1 that shows most of the earthquakes area.
  3. When I look at (long, depth) and (long, mag) graphs, it looks like I can put the curve line on top of those 2 graphs and shows a few outliers on longitude is 200 and depth is more than 400. It also shows that a lot of magnitude between 7 and 8 earthquakes happened in the longitude between the 100 to 200 degree area.
  4. Depth column shows most of earthquakes belong to shallow class, even through out all magnitudes, Years, Months, and Hours.
  5. From the mag (magnitude) column, it shows for the past 100 years, magnitudes between 7 and 8.4 were possible to happen at any hour or month.
  6. It seems there is no significant correlation among hour, month, and years.

Reflection

=======================
    When I started this project, these are some of questions that I want to answer:
  • Which decades had the largest and most earthquakes since 1900?
  • Which area had the most amount of earthquakes since 1900?
  • Is there any relation between magnitude and depth of the epicenter?
  • Which month has the most amount of earthquakes since 1900? (Maybe it related to the gravity of Moon and Sun?)
  • Possible the relationship between magType and magnitude?
  • I wasn’t even know if I can draw any conclusion at all. Because people cannot really control when and where earthquake to happen, also it is hard to tell that if there is a large earthquake in Japen, and few days later, there is aother large earthquake in China or in the Java Sea. We cannot really conclude those are somewhat indirectly related to each other.

Where did I run into difficulties in the analysis?

There were many difficulties for this project (before, during and after):
  • Before I started this project, I wasn’t sure what kind of project I would do. It took me over a week to decide and search for a dataset.
  • When I decided to do this project, I was just interested in the area of earthquakes, and how many large earthquakes has happened so far. I never really thought that I would be able to draw any conclusions from the dataset. I also wasn’t sure:
    1. how large is considered a large earthquakes.
    2. what is the year range I should consider.
    3. what kinds of questions I should find out
    4. Will I be able to find out the answeres that I want to find.
    After finding the dataset, and starting to investigate the data, I was going back and forth if I should change the data selection, such as changing the range of the magnitude, the range of the event.
  • There are only few numerical fields and few categorical fields that are able to do the analysis. Many fields that I created were after some struggles and not sure how I should do, than I created to help me to view on many occassions.
  • Because each earthquake is independent from each other (unless there is a field of data shows it was aftershock of the major one), it is really difficult to create some graphs, such as line or bar graphs. It seems scatter plot was the best choice. Also it is a discrete event for all the data, so it just does not make sense to find the average, mean, or quartiles.
  • After creating so many graphs, I wasn’t sure what kind of analysis or conclusion I can make out of these. Basic data information were created by USGS, I can’t really change some of the information. Some fields looked interesting and I thought I might be able to use them to run analysis out of it, but it turned out many data for those variables have no information.
  • There is lots of repeated code because I want to make sure when I compared each group, they have same basic template to compare though using different variables.
  • I cannot really conclude or make positive analysis that my analysis will be correct for future earthquakes prediction. Because each fault line, or the reasons that cause earthquakes, are unknown, in this dataset, the Earth has a mind of its own.

Where did I find successes?

The first success that I found was when I created the plot 1 graph that is similar to the one USGS created, which it also helped me solve many of the difficulties/struggles that I had when I worked on this project. Although I do not include the map, plate bouundaries, and active volcanoes locations, I feel I did the best as I can. It proves that most of the earthquakes happen around “the Ring of Fire”.
Through out this project, it helped me understand more about how to create different graphs using ggplot. Some of graphs in this datasets just not suitable than the other.
When I found out that most of earthquakes are shallow, and are rarely deep earthquakes, that kind of explains to me why many earthquakes cause severe damage/casualties when we hear that the eathquake is magnitude 7 or higher and it is shallow either from the news or other datasets. Also throughout out the years, every year will have at least 1 large earthquake that is magnitude 7 or higher.
And for the past 100+ years, there is no earthquake that is magnitude 7 or higher recording that happened at either Noth Pole or South Pole.

How could the analysis be enriched in future work?

I got this dataset through USGS website. Unfortunately in this data collection, it does not include some important information for this dataset. If it included:
  • faultline - then maybe we can conclude or have a better prediction which faultline has a higher probability to have large earthquakes.
  • Phase of the Moon - we might be able to determine if it has some kind of relationship with the occurence of earthquakes.
  • Location of volcano - it can help to decide if because of volcanic activity that might affect the earthquake/faultline to occur.
USGS probably have all these information, but it probably needs to combine different datasets to have all the information. It would be nice to include in the search form to help future researchers.